A Neural Network Language Document Representation Technique for Web-Page Classification

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Representations for Classification of Short Web-Page Descriptions

Motivated by applying Text Categorization to classification of Web search results, this paper describes an extensive experimental study of the impact of bag-ofwords document representations on the performance of five major classifiers – Naïve Bayes, SVM, Voted Perceptron, kNN and C4.5. The texts, representing short Web-page descriptions sorted into a large hierarchy of topics, are taken from th...

متن کامل

Arabic Script Web Document Language Identifications Using Neural Network

This paper presents experiments in identifying language of Arabic script web documents using neural network. There are some difficulties when identifying those languages in Arabic script such as Persian, Turkish, Urdu, Jawi etc. Since there is a vast amount of information presented to the internet users, it is crucial to find an appropriate method in language identification for a variety of tex...

متن کامل

A Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification

In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...

متن کامل

The hybrid representation model for web document classification

Most web content categorization methods are based on the vector-space model of information retrieval. One of the most important advantages of this representation model is that it can be used by both instance-based and model-based classifiers. However, this popular method of document representation does not capture important structural information, such as the order and proximity of word occurre...

متن کامل

Incremental Document Clustering for Web Page Classiication

Motivated by the beneets in organizing the documents in Web search engines, we consider the problem of automatic Web page classiication. We employ the clustering techniques. Each document is represented by a feature vector. By analyzing the clusters formed by these vectors, we can assign the documents within the same cluster to the same class automatically. Our contributions are the following: ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Computer Applications

سال: 2020

ISSN: 0975-8887

DOI: 10.5120/ijca2020920071